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Abstract 


Standard  methods  for  nonlinear  equations  and  unconstrained  minimization 
base  each  iteration  on  a  linear  or  quadratic  model  of  the  objective  function, 
respectively.  Recently,  methods  using  two  generalizations  of  the  standard 
models  have  been  proposed  for  these  problems.  Conic  methods  for  uncon¬ 
strained  minimization  use  a  model  that  is  the  ratio  of  a  quadratic  function 
divided  by  the  square  of  a  linear  function.  Tensor  methods  for  nonlinear  equa¬ 
tions  augment  the  standard  linear  model  with  a  simple  second  order  term.  This 
paper  surveys  the  research  to  date  on  methods  for  unconstrained  minimization 
and  nonlinear  equations  that  use  conic  and  tensor  models.  It  begins  with  a  brief 
summary  of  the  standard  methods,  so  that  the  paper  is  essentially  self- 
contained. 
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1.  Introduction 

The  two  major  unconstrained  nonlinear  algebra  problems  are  the  nonlinear 
equations  problem 

given  F  : find  such  that  F(a:*)=0  (l.l) 

where  we  assume  F€.C^,  and  the  unconstrained  minimization  problem, 

minimize  f 

»ei?"  vJ-*; 

where  we  assume  /eC®.  Computational  methods  exist  that  solve  many  such 

problems  successfully  and  efficiently,  but  research  aimed  at  improving  these 
methods  continues.  In  this  paper,  we  chscuss  two  recently  introduced  classes  of 
algorithms  for  solving  these  problems,  conic  methods  for  imconstrained  minimi¬ 
zation  and  tensor  models  for  nonlinear  equations.  Both  classes  contain  interest¬ 
ing  innovations  and  seem  to  offer  advantages  over  the  standard  methods, 
although  it  is  too  early  to  access  the  ultimate  importance  of  either  one. 

We  assume  the  reader  has  at  least  some  familiarity  with  computational 
methods  for  nonlinear  equations  and  unconstrained  minimization,  although  we 
briefly  summeu"ize  the  leading  methods  in  Section  2.  Some  survey  papers  on 
these  methods  include  Brodlie  [1977],  Dennis  [1977],  Schnabel  [1982a]  and  More' 
and  Sorensen  [1982].  The  books  by  Fletcher  [1980],  Gill,  Murray,  and  Wright 
[1981],  and  Dennis  and  Schnabel  [1983]  contain  a  more  detailed  treatment. 

We  will  denote  the  matrix  of  first  partial  derivatives  of  F  at  z,  the  Jacobian 
matrix,  by  here  F^{x)[i.j]=dfi{x)/  ax [;]  where  fi,  R^-*R  is  the  i*'* 

component  function  of  F(x).  We  will  denote  the  vector  of  first  partial  derivatives 
of  /  at  X,  the  gradient  vector,  by  V/(x)ei?'‘,  and  the  symmetric  matrix  of 
second  partial  derivatives  of  /  at  x,  the  Hessian  matrix,  by  V®/ (x)eF"’’‘": 
Vf  {x)[i]=df  /  dxli}  and  (.x)[i,j]=d^f  /  dx[i]dx[j].  Note  that  we  are  denot¬ 
ing  the  component  of  a  vector  x  by  x[i]  so  that  we  can  reserve  the  notation 
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Xi  for  the  ith  iterate  in  a  sequence  of  vectors 

The  main  difference  between  standard  methods  and  conic  and  tensor 
methods  is  in  the  local  model  of  the  nonlinear  function  that  the  method  uses  in 
determining  its  iterates.  Stemdard  methods  for  nonlinear  equations  base  the 
step  from  the  current  iterate  Xg  upon  a  lineeir  model  of  F{x)  around  Xg , 

M(xg+d)  -  Fixg) -i- Jgd  ( 1 . 3) 

where  d  and  is  F'(xg )  or  some  approximation  to  it.  Similarly,  stan¬ 

dard  methods  for  unconstrained  minimization  base  each  iteration  upon  a  qua¬ 
dratic  model  of  /  (a:)  around  Xg , 

mixg+d)  -  f  (Xc)  +  pjd  +  }^'^Hgd  (1.4) 

where  G/?”  is  V/  (xc)  or  a  finite  difference  approximation  to  it.  and  is 

’^fixg)  or  some  symmetric  approximation  to  it.  These  two  models  are  closely 

related  because  the  minimizer  of  f{x)  must  occur  at  a  point  x«  where 

V/  (x*)=0,  and  the  gradient  of  the  model  (1.4). 

Vm(x,+d)  =  V/  {xg)  -i-Hgd  (1.5) 

is  a  linear  model  of  the  system  of  nonlinear  equations  V/  (x) :  i?” -»/?". 

The  two  new  classes  of  methods  are  based  upon  generalizations  of  (1.3)  eind 
(1.4).  Conic  methods  for  unconstrained  minimization  base  each  step  on  a  model 
of  the  form 


m(x  +d)  =  fix  )  + 

m\xg+d)  f{Xg)+ 


(1.6) 


where  is  sirmmetric  and  6c G/?”.  Tensor  methods  base  each  iteration 


on  a  model  of  the  form 


M{xg+d)  =  Fixg)  +  /cd  +  Tgdd  ’  (1-7) 

where  Tc has  a  particularly  simple  form.  Here  we  use  the  notation  Tgdd 

to  denote  the  vector  in  i?"  whose  component  is 
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(r,cid)[i]=  t  t  n[i,j.k]  d[j]  d[k].  (1.8) 

1  =  1  k  =  l 

Of  course  the  justification  for  either  of  these  models  is  not  obvious  and  we 
explain  it  in  this  paper.  Conic  models  were  introduced  by  Davidon  [1980]  and 
also  have  been  investigated  by  Bjorstad  and  Nocedal  [1979],  Sorensen  [1980], 
Stordahl  [1980],  Davidon  [1982],  Gourgeon  and  Nocedal  [1982],  and  Schnabel 
[l982b].  Tensor  models  were  introduced  by  Schnabel  and  Frank  [1982]  and  also 
are  discussed  in  Frank  [1982].  The  main  goal  of  the  developers  of  tensor 
methods  is  to  improve  the  performance  of  existing  methods  on  problems  were 
/^(x*)  is  singular  or  ill-conditioned,  while  at  least  maintaining  the  performance 
of  the  existing  methods  on  all  other  problems.  The  developers  of  conic  methods 
do  not  seem  to  have  a  similarly  limited  objective. 

The  remainder  of  the  paper  is  organized  as  follows.  Section  2  provides  a 
brief  survey  of  the  leading  standard  methods  for  nonlinear  equations  and  uncon¬ 
strained  minimization,  which  are  based  on  the  models  (1.3)  and  (1.4)  respec¬ 
tively.  These  include  both  the  derivative  methods  used  when  F'{x)  or  V®/  (x)  are 
available  analyticedly  or  from  finite  differences,  and  the  secant  methods  that  are 
used  otherwise.  We  concentrate  on  the  ideas  and  properties  that  are  relevant  to 
our  discussion  of  conic  eind  tensor  methods.  A  reader  familiar  with  these 
methods  should  skip  or  skim  Section  2.  In  Section  3  we  briefly  discuss  several 
extensions  of  the  standeird  methods  that  help  motivate  conic  and  tensor 
methods.  These  are  the  methods  of  Barnes  [1965]  and  Gay  and  Schnabel  [1978] 
for  nonlinear  equations  and  of  Davidon  [1975]  for  unconstrained  minimization. 
They  all  still  use  the  standard  models  (1.3)  and  (1.4),  but  some  of  their  objec¬ 
tives  eind  techniques  are  similar  to  conic  and  tensor  methods.  We  discuss  conic 
methods  in  Section  4,  and  tensor  methods  in  Section  5.  We  comment  briefly  on 
the  application  of  these  two  classes  of  methods  to  other  nonlinear  problems  in 


Section  6. 
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In  our  opinion,  this  paper  covers  most  of  the  important  methods  for  non¬ 
linear  equations  and  unconstrained  minimization  based  on  nonstandard  models. 
There  has  been  occasional  other  work  along  these  lines,  however.  Perhaps  most 
significant  are  the  methods  for  homogeneous  functions  investigated  by  Jacobson 
and  Oxsman  [1973],  Charalambous  [1973],  Kowalik  and  Ramakrishnan  [1976], 
and  others.  These  methods  do  not  seem  to  have  led  to  improved  algorithms  for 
general  classes  of  problems. 


2.  Standard  mod.els  and  methods 

The  fundamental  method  for  solving  the  nonlinear  equations  problem  is 
Newton's  method.  It  consists  of  choosing  the  new  iterate,  as  the  root  of  the 
lineeir  model  of  Fix')  around  , 

M{x^+d)  -  F{Xf.)  +  F{xc)d,  (2.1) 

the  first  two  terms  of  the  Taylor  series.  If  F{x^)  is  nonsingular,  (3.1)  has  a 

unique  root  at 

x^=x,  ~!^ix,)-^F{x^).  (3.3) 

If  F(x*)=0.  f^(x*)  is  nonsingular,  and  f{x)  is  Lipschitz  continuous  in  an  open 

neighborhood  containing  x*.  then  the  sequence  produced  by  iterating  (3.3)  is 

well-defined  and  converges  q-quadratically  to  x*.  provided  the  starting  point 

Xo£/f”  is  sufficiently  close  to  x*.  A  method  that  converges  provided  it  is  started 

sufficiently  close  to  the  solution  is  called  locally  convergent.  (For  our 

definitions  of  rates  of  convergence,  see  for  example  Ortega  and  Rheinboldt 

[1970]  or  Dennis  and  Schnabel  [1983],) 

There  are  four  weaknesses  of  Newton's  method  as  a  computational  pro¬ 
cedure  for  solving  systems  of  nonlinear  equations  that  we  wish  to  discuss.  They 
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£ire 

(1)  The  sequence  of  iterates  may  not  converge  to  any  root  if  Zq  is  not 
sufficiently  close  to  a  root. 

(2)  The  iteration  (2.2)  is  not  well-defined  computationally  if  is  singular  or 

ill-conditioned. 

(3)  Newton’s  method  usually  is  slowly  locally  convergent  or  does  not  converge 

at  all  to  a  root  where  is  singulcir. 

(4)  f^{x)  may  not  be  available  in  practical  applications. 

The  first  difficulty  is  addressed  by  modifying  (2.2)  when  necessary  so  that 
the  method  converges  to  a  root  from  starting  points  outside  the  region  of  local 
convergence.  'This  property  is  called  global  convergence.  The  most  common 
modifications  to  achieve  global  convergence  are  the  line  search,  where  each 
is  chosen  by 

x^=xa  -  \cF'(xc)-‘i^(xc)  (3.3) 

for  some  Xe>0,  and  the  trust  region  approach,  where  is  chosen  by 

z^-zg  -  (F’(xc)^/’'(xc)+ac/)~^F’(xc)^/’(X(,)  (2.4) 

with  agfeO.  In  both  cases,  the  real  valued  parameter  Xc  or  is  selected  so  that 

x^  Is  a  satisfactory  next  iterate,  for  exeimple  so  that  llF(x^.)ll2  <  l|/’(xc)||2.  In  the 
line  search,  Newton’s  method  corresponds  to  =  arid  it  is  guaranteed  that 
||/’(x+)ll2  <  ||/'(x+)||2  for  sufficiently  small  positive  Xg .  In  the  trust  region  formula 
(2.4),  Newton’s  method  is  ac=0,  and  j|F(x4.)||2  <  il/’(xc)||2  is  guaranteed  for 
sufficiently  large  positive  .  Since  the  new  algorithms  use  the  same  types  of 
modifications  to  achieve  global  convergence,  we  do  not  discuss  these  strategies 
further.  Many  of  the  references  listed  in  the  second  paragraph  of  Section  1  con¬ 
tain  information  on  these  strategies. 

Various  modification  may  be  made  to  these  methods  when  f{x^)  is  singuleir 
or  ill-conditioned.  These  include:  i)  replacing  F'(xc)“‘  in  the  line  search  formula 
(2.3)  by  the  pseudo-inverse  F^{xcY,  where  the  pseudo-inverse  of  may  be 

defined  by 


6 


A*  -  lim  {A'^A^yl)  ^2,5) 

ii)  replacing  in  the  line  search  by  {F^ {XqY F* {x^)  +  yl)~^ F' {x^Y  with  an 

appropriate  small  positive  value  of  7;  iii)  using  the  trust  region  iteration  (2.4) 
with  Ob  strictly  positive.  For  further  information,  see  Section  6.5  of  Dennis  and 
Schnabel  [1983].  We  mention  this  difficulty  mainly  because  tensor  models  for 
nonlinear  equations  deal  with  it  nicely. 


If  is  singulEir.  the  convergence  of  the  existing  methods  to  usually 

is  linear  at  best,  even  with  the  above  modifications.  (See  Decker  and  Kelley 
[l9B0a.  1980b.  1982].  Griewank  [1980],  Griewank  and  Osborne  [1981],  Reddien 
[1978,  1980],  Rail  [1966]  for  a  discussion  of  the  convergence  of  Newton’s  method 
on  singular  problems.)  Some  modifications  have  been  proposed  to  speed  conver¬ 
gence  on  singular  problems  (see  many  of  the  same  references),  but  they  mainly 
require  apriori  knowledge  that  F^{xc)  is  singular  and  do  not  seem  suitable  for 
genered  classes  of  problems. 


If  the  Jacobian  matrix  F^{x)  is  not  available  in  analytic  form,  it  may  be 
approximated  by  finite  differences,  meeining  that  the  column  of  F^{xe)  is 
approximated  by 


iJc) 


GolumnjI 


F’jxg+heY  -  F{x^) 
h 


(2.6) 


for  some  small  h£li.  (Here  ej  denotes  the  unit  vector.)  If  the  expense  of 
this  approximation,  n  additional  evaluations  of  F{x)  per  iteration,  is  acceptable, 
this  is  done  and  the  aforementioned  methods  are  used  with  (2.6)  in  place  of 
J^iXc).  If  the  stepsizes  h  are  chosen  correctly,  these  is  little  or  no  deterioration 
in  performance  when  changing  from  analytic  to  finite  difference  Jacobians. 


If  the  additional  cost  of  finite  difference  Jacobian  approximation  is  unac¬ 
ceptable,  then  a  class  of  methods  referred  to  as  secant  (or  quasi-Newton) 
methods  is  used  instead.  These  methods  replace  F'ix^)  in  formulas  (2.2),  (2.5), 
or  (2.6)  by  a  less  precise  approximation  calculated  as  follows.  At  the  first 
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iteration.  Jo  is  the  finite  difference  approximation  to  /’’(xo)'  After  the  step  from 
Zc  to  is  determined,  the  approximation  Jg  to  F^ixg)  is  updated  into  an 
approximation  J+  to  F*(z+).  The  most  commonly  used  updating  rule  is 

iVc 


~  Jn 


s 


(2.7) 


where 


Sg  =  z+  -  Zc .  Vc  =  -  F{x^).  (2.B) 

This  update  was  introduced  by  Broyden  [1965].  It  obeys  the  secant  equation 

/+Sc=?/c.  (2.9) 

the  multi-dimensional  generalization  of  the  standard  one  dimensional  secant 

equation.  For  any  that  obeys  (2.9).  the  new  linear  model  of  F{x)  around  x^., 

M{x^+d)  ~  F(z+)  +  J^d  (2.10) 

obeys 

M{x^)  =  F(z^).  Mix,)  =  Fix,).  (2.11) 

Update  (2.7)  is  selected  because  of  all  the  matrices  obe3ring  (2.9),  /+  given  by 

(2.7)  is  the  closest  to  J,  in  the  Frobenius  norm.  (The  Frobenius  norm  of  a 

matrix  or  tensor  is  the  square  root  of  the  sum  of  the  squares  of  all  the  matrix's 

or  tensor's  components.) 

The  local  method  obtained  by  using  (2.7)  to  calculate  the  Jacobian  approxi¬ 
mations  with  Jq  a  finite  difference  approximation  to  F'(zo),  and  using 

z+  =  Zc  -  J^^Fix,)  (2.12) 

to  calculate  the  steps  is  referred  to  as  Droyden's  method.  It  is  locally  q- 

superlinearly  convergent  to  a  root  z*  under  the  same  assumptions  on  Fix)  and 

z„  stated  above  for  the  q-quadratic  convergence  of  Newton's  method.  Notice 

that  a  secant  method  for  nonlineeu-  equations  requires  the  values  of  F(z)  at  the 

Iterates,  and  no  other  function  or  derivative  values.  In  general,  secant  methods 

for  nonlinear  equations  or  unconstrained  minimization  usually  require  more 

iterations  to  solve  a  particular  problem  than  the  corresponding  analytic  or  finite 
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difference  derivative  method,  but  they  usually  require  fewer  function  evalua¬ 
tions  than  the  finite  difference  method.  Thus  they  usually  Eire  preferred  for 
problems  where  function  evaluation  is  expensive  and  analytic  derivatives  are 
unavailable. 

The  above  discussion  of  secant  methods,  while  cursory,  contains  the  back¬ 
ground  required  for  our  forthcoming  consideration  of  conic  and  tensor  models. 
In  particular,  we  emphasize  the  interpolation  property  (2.11)  that  results  from 
formulas  (3.9)  eind  (2.10).  For  further  information  on  these  methods,  see  Dennis 
and  More’  [1977],  or  the  references  in  paragraph  2  of  Section  1. 

Now  let  us  turn  to  unconstredned  minimization.  Newton's  method  for 
unconstrained  minimization  is  based  on  the  quadratic  model  of  /  (z)  around  Xg , 

m{xg  +  d)  =  /(Zo)  +  +  }^^V®/(zo)ci,  (2.13) 

the  first  three  terms  of  the  Taylor  series.  If  V^/(zc)  is  positive  definite, 

m(zc  +d)  has  a  unique  minimizer  at 

Zt  =  Zc  -  VV  )"*V/  (ze  ),  (2. 14) 

Alternatively,  the  iteration  (2.14)  can  be  derived  by  considering  the  linear  model 

of  V/  (z )  Eiround  Zg , 

M{Xc+d)  =  V/  (Zg)  +  V^/  (zg)ti  (3.15) 

and  selecting  z+  as  the  root  of  Mix^-i-d).  Viewed  in  this  way,  (2.14)  is  just  the 
application  of  Newton's  method  for  nonlinear  equations  to  the  problem  V/  (z)=0. 
Therefore  it  is  locally  q-quadratically  convergent  to  any  point  z*  where 
(a^»)  is  nonsingular,  and  V®/  (z)  is  Lipschitz  continuous  in  an  open 
neighborhood  containing  z*.  Such  a  point  may  be  a  minimizer,  maximizer,  or 
saddle  point  of  /  (z). 

The  four  weaknesses  of  Newton’s  method  for  nonlinear  equations  that  we 
discussed  carry  over  to  Newton's  method  for  unconstrained  minimization,  eind 
the  solutions  are  similar.  Global  convergence  usually  is  achieved  by  modifying 
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(2.14)  to 


or 


3-  ^ S  (^C  ) 


(2.16) 


x^  =  Xe  -  (VV(a;c)+acO  *V/(3:c).  (2.17) 

In.  the  first  case,  -  V®/(^e)  if  is  safely  positive  definite,  otherwise  Bg 

is  some  positive  definite  modification  of  for  example  He  =  {xc)+7l 

with  7  large  enough  to  make  He  positive  definite.  Then  it  is  guaranteed  that 
/  (x^)  <  /  (xg)  for  sufficiently  small  positive  Xg .  In  the  second  case,  ttg  is  nonne¬ 
gative  if  V®/  (xg)  is  positive  definite,  and  leurger  than  the  magnitude  of  the  most 
negative  eigenvalue  of  VV  (®c)  otherwise.  It  is  guaranteed  that  f{x+)  <  /(xg) 
for  sufficiently  large  positive  Og.  The  conic  methods  we  discuss  use  the  same 
strategies;  no  further  understanding  of  these  strategies  is  required  for  the  pur¬ 
poses  of  this  paper. 

Modifications  (2.16)  or  (2.17)  sucessfuUy  deal  with  the  problem  of  defining  a 
satisfactory  step  when  V®/  (xg  )  is  singular  or  ill-conditioned.  However,  standard 
methods  still  usually  converge  linearly  at  best  to  a  point  where  V/  (x*)=0  and 
V®/(**)  is  singular. 

Finally.  Newton’s  method  for  unconstrained  minimization  requires  both  the 
gradient  vector  Vf(x)  and  the  Hessiein  matrix  V®/(x).  If  the  gradient  is  not 
available  analytically,  it  must  be  approximated  by  finite  differences  since  accu¬ 
rate  gradient  values  are  essential.  If  the  Hessian  matrix  is  not  available.  V®/(x) 
is  replaced  by  a  finite  difference  approximation  if  evaluation  of  f  (x)  is  inexpen¬ 
sive.  by  a  secant  approximation  otherwise.  Secant  approximations  for  uncon¬ 
strained  minimization  are  derived  similarly  to  Broyden's  update  for  nonlinear 
equations.  After  a  step  from  Xg  to  x^.,  the  approximation  Hg  to  V®/(Xg)  is 
updated  into  an  approximation  H^.  to  V®/  (x+)  obeying 


Bi-Sc  =  Ve 


(2.18) 
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where 

Sg  —  ~  Xg  ,  J/g  —  Vjf  (x  +)  Vf  (Xg  )  .  (3.  19) 

Thus  the  quadratic  model 

m(x4.+d)  =  /(x+)  +  V/  (x+)^cZ  +  H^d  (3,30) 

satisfies  the  interpolation  conditions 

m(xt)  =  /  (x+),  Vm:(x+)  =  V/(x+).  Vm(xc)  =  V/ (xg).  (3.31) 

In  addition,  is  chosen  to  be  symmetric  since  V^/(x)  always  is  symmetric. 

Still,  many  symmetric  i/+.  satisifying  (3.18)  exist;  the  most  used  choice  is  the 

Broyden-Fletcher-Goldfarb-Shanno  (BFGS)  update 


TT  -  TJ  1  HgSgSjHg 

*  ~  ®  yjsg  sJ’/ZgSg 

(3.33) 

If  Hg  is  positive  definite  and 

sJvg  >  0, 

(3.33) 

is  positive  definite  as  well.  In  practice  the  initial  approximant  Hq  is  chosen" 
to  be  positive  definite  and  the  step  selection  strategy  enforces  (3.33),  so  all  the 
BFGS  approximants  to  the  Hessian  are  positive  definite.  This  simplifies  the 
modifications  required  to  achieve  global  convergence.  The  local  method  result¬ 
ing  from  using  (3.33)  to  define  the  Hessian  approximations  and 

x^.  =  xg  -/fg-iV/(xg)  (3.34) 

to  define  the  steps  is  locally  superlinearly  convergent  to  a  point  x,  where  /(x) 

andx*  obey  the  conditions  for  the  q-quadratic  convergence  of  Newton’s  method, 

If  in  addition  7^/  (x,)  is  positive  definite  and  Hq  is  sufficiently  close  to  V®/  (xq). 

In  summary,  we  divide  the  methods  considered  in  this  paper  into  the  four 
categories  given  in  Table  3.1  below.  Table  3.1  also  lists  the  information  interpo¬ 
lated  by  the  local  models  at  Xg  used  by  the  standard  methods  for  each  category. 
We  denote  the  iterate  before  Xg  by 

In  Section  4  we  also  refer  to  einother  type  of  method  for  unconstrained 
minimization,  conjugate  direction  methods.  These  methods  are  related  to 
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Table  2.1  ~  Categories  of  methods  considered  in  this  paper 

(1)  First  derivative  methods  for  nonlinear  equations 
local  model  at  interpolates  F{xc),  F^(xc) 

(8)  Secant  methods  for  nonlinear  equations 

local  model  at  Xg  interpolates  F{Xg),  F{xj^y) 

(3)  Second  derivative  methods  for  unconstrained  minimization 

local  model  at  Xg  interpolates  f{xg),  V/  {xg),  (Xg) 

(4)  Secant  methods  for  unconstrained  minimization 

local  model  at  Xg  interpolates  /  (xc),  7/  (xc),  V/  (xpj^„) 


secant  methods  for  unconstrained  minimization  in  that  they  use  function  eind 
gradient  information  only.  They  differ  in  that  do  not  use  any  approximation  to 
the  Hessian,  and  require  only  0(n)  storage.  Thus  they  are  mainly  intended  for 
problems  where  n  is  large  and  the  use  of  0(n®)  storage  locations  is  undesirable. 
Many  of  them  have  the  property  that  if  /  (x)  is  a  positive  definite  quadratic,  then 
the  iterate  of  the  method  will  be  the  minimizer  x*.  Space  does  not  permit 
us  to  describe  conjugate  direction  methods  further  here;  they  are  described 
thoroughly  in  Fletcher  [1980],  Hestenes  [1980],  and  Gill.  Murray,  and  Wright 
[1981]. 

The  conic  and  tensor  methods  to  be  described  are  based  on  generalizations 
of  the  models  discussed  in  this  section.  As  motivation,  Table  2,2  summarizes  the 
properties  of  a  good  model. 
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Table  2.2—  Properties  of  a  good  model  for  nonlinear  equations  or 
unconstrained  minimization 

( 1)  The  model  should  interpolate  useful  information. 

(2)  The  model  should  be  a  useful  approximation  to  the  problem. 

(3)  The  model  should  be  easy  to  form. 

(4)  The  model  should  be  easy  to  solve. 


Conic  and  tensor  models  aim  to  improve  properties  1  and  2  without  seri¬ 
ously  harming  properties  3  and  4.  The  hope  is  that  the  additional  costs  incurred 
in  items  3  and  4  will  be  offset  by  gains  In  the  efficiency  or  success  rate  of  the 
algorithm.  As  a  point  of  reference,  Table  2.3  summarizes  the  costs  that  may  be 
used  to  measure  the  efficiency  of  algorithms  for  nonlinear  equations  or  uncon¬ 
strained  minimization,  and  where  applicable,  the  costs  incurred  by  the  standard 
methods. 


Table  2.3  -  Costs  of  solving  nonlinear  equations  or  unconstrained  minimization 
problems  by  standard  methods 

(1)  Algorithmic  overhead  (dominated  by  cost  of  solving  the  linear  system 
to  find  the  Newton  or  secant  step;  0{n^)  for  derivative  methods,  0(n^ 
for  secant  methods) 

(2)  Storage  (between  nV  2  and  2n^  locations) 

(3)  Evaluations  of  Fix)  or  /  (a:),  and  possibly  derivatives 


In  many  practical  applications,  the  evaluations  of  f^(x)  or  f  (x)  are  very 
expensive  and  are  the  dominant  cost.  Therefore  when  assessing  the  efficiency  of 
new  methods,  it  is  desirable  that  they  solve  problems  using  fewer  evaluations  of 
the  nonlinear  function.  It  also  is  important,  however,  that  they  do  not  appreci¬ 
ably  increase  the  algorithmic  overhead  or  storaige  requirements  of  the  standard 
algorithms. 
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3.  Interpolating  additional  information  using  standard  models 

Conic  and  tensor  models  interpolate  more  function  and  derivative  informa¬ 
tion  than  the  standard  models  listed  in  Table  2.1,  by  using  a  more  general 
model.  In  one  of  our  four  problem  classes,  secant  methods  for  nonlinear  equa¬ 
tions,  it  is  in  fact  possible  to  interpolate  additional  information  using  the  stan¬ 
dard  model.  For  unconstrained  minimizatioa  this  is  only  possible  in  general  for 
quadratic  objective  functions.  We  briefly  discuss  these  ideas  to  motivate  further 
conic  and  tensor  methods. 

Secant  methods  for  nonlineeir  equations  use  the  model 

M{x^+d)  =  F{x^)  +  J+d  (3.1) 

to  model  F{x)  eiround  x+.  The  secant  equation 

(3.2) 

guareintees  M{xg)  =  F{Xc).  Suppose  we  also  want  the  model  to  interpolate  the 
function  values  F{x^)  at  some  previous  iterates  x_i,  i  =  l.  ■  ■  ,p.  This  requires 

F(x-i)  =  F(x+)  +  /+(x_i-x+)  (3.3) 

or 

J^Si=yi  (3.4) 

where 

Si  =  -  x_i,  Vi  =  F{x^.)  -  F’(x_i).  (3.5) 

Since  /+  is  an  nxn  matrix,  we  may  satisfy  (3.2),  plus  (3.4)  for  up  to  n-l 
values  of  i.  as  long  as  Sc  ■  ‘  ‘  >  ^n-i  linearly  independent.  This  possiblity, 

and  a  generalization  of  Broyden's  method  that  achieves  it,  was  first  proposed  by 
Barnes  [1965].  However  the  method  was  not  very  successful  in  practice:  prob¬ 
lems  arose  when  the  directions  to  the  past  points  Sg ,  Sj,  •  •  •  ,  s„_i  were  linearly 
dependent  or  close  to  being  dependent.  Gay  and  Schnabel  [1978]  revived  and 
modified  Barnes’  idea.  By  limiting  the  set  of  additional  past  function  values  to 
be  interpolated  to  p<Ti.  points  for  which  Sg,  Sj,  ■  •  •  .  are  strongly  linearly 
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independent,  they  were  able  to  construct  a  locally  q-superlinearly  convergent 
algorithm  that  appears  quite  competitive  with  a  standard  Broyden’s  method 
algorithm  in  practice.  We  do  not  discuss  their  method  further  here.  We 
emphasize,  however,  the  two  ideas  that  will  recurr  in  the  forthcoming  methods: 
i)  using  the  model  to  interpolate  function  values  at  previous  iterates,  eind  ii)  lim¬ 
iting  these  previous  iterates,  possibly  in  a  fairly  restrictive  manner. 

The  obvious  extension  of  the  idea  of  Barnes  and  Gay  and  Schnabel  to  uncon¬ 
strained  minimization  does  not  work  in  general.  The  extension  would  be  to  ask 
the  secant  model  of  /  {x)  around  x+, 

Vm{x^.+d)  =  Vf  (a+.)  +  H+d  (3.6) 

to  interpolate  V/(x_i)  at  some  previous  iterates  x_i  as  well  as  interpolating 

V/  (xc )  at  Xc .  The  difficulty  comes  from  the  required  symmetry  of  Suppose 

for  the  model  (3.6),  Vm(xg)=V/  (xp)  and  Vm(x_i)=V/  (x_j).  This  would  require 

Hi.Sc=yc  and  (3.7) 

where  and  Sj  are  defined  as  above  and 

1/c  =V/(x^)-V/(x„).  y,  =  V/(x^)-V/(x_i).  (3.8) 

Since  //+  is  symmetric,  (3.7)  would  imply 

sfl/c  =  •  (3.9) 

since  both  sides  of  (3.9)  equal  While  (3.9)  is  satisfied  if  /(x)  is  a  qua¬ 

dratic,  it  is  not  satisfied  in  general  for  nonquadratic  functions. 

Davidon  [1975]  proposed  a  secant  method  for  unconstrained  minimization 
that  interpolates  up  to  n  past  gradients  when  f{x)  is  quadratic,  and  suggested 
an  extension  to  general  objective  functions.  Schnabel  [1977]  studied  Davidon's 
method  and  proposed  severed  other  extensions.  None  of  these  have  proven  supe¬ 
rior  to  the  standeird  seceint  methods  for  unconstrained  minimization  in  practice. 
The  desire  to  interpolate  additional  previous  function  or  gradient  information 
partiedly  motivates  some  of  the  conic  methods  for  unconstrained  minimization 
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that  we  discuss  next. 


4.  Conic  methods  for  unconstrained  minimization 


The  use  of  conic  models  in  unconstrained  minimization  algorithms  was  first 
proposed  by  Davidon  [1960]  and  much  of  the  following  background  material  is 
contained  in  his  paper.  A  conic  function  has  two  equivalent  algebraic  forms. 
One  is  the  ratio  of  a  quadratic  function  divided  by  the  square  of  a  linear  func¬ 
tion, 


c(d)  = 


f  +  -t-  WBd 


where  /  ei?,  a,d,heE^.  Equation  (4.1)  is  equivalent  to 


where 


)^d^i4d 
{l  +  b^df 


(4.1) 


(4.2) 


g=h-fb,  A  =  B  -  gb"^  -  bg"^  -  Zfbb^ .  (4.3) 

The  form  (4-2)  is  used  for  the  remainder  of  this  section.  The  function  c  (d)  is 

called  a  conic  because  its  level  sets  are  conic  sections,  i.e.,  circles,  ellipses,  par¬ 


abolas.  or  hyperbolas.  Figure  1  is  an  example  of  a  conic  function  in  one  variable. 
Note  the  discontinuity  in  c(d):  in  general,  function  (4.2)  is  discontinuous  along 
the  n-1  dimensional  hyperplane  [d  1  l+6^d=0j.  called  the  Jwrizon  of  c  (d).  The 
vector  b  is  called  the  gauge  vector. 


The  conic  function  (4.2)  is  related  to  the  quadratic 

c{s)  -  f  +  g'^s  + 


by 


s  = 


l  +  b'^'d’ 


(4.4) 


(4.5) 


Equation  (4.5)  is  known  as  a  collinear  scaling  of  d;  if  d  and  s  are  related  by  (4.5) 


then 


(4.6) 


Collineeir  scedings  map  straight  lines  to  straight  lines,  affine  subspaces  to  affine 
subspaces,  and  most  generally,  convex  sets  to  convex  sets.  Furthermore,  they 
are  the  most  general  transformation  with  these  properties.  (A  collinear  scaling 
may  have  a  slightly  more  general  form  than  (4.5);  see  Davidon  [1960].)  Of 
course,  the  mapping  (4.5)  has  the  discontinuity  mentioned  above.  The  relation¬ 
ship  between  conics  and  quadratics  may  be  used  to  derive  some  of  the  conic 
algorithms  described  in  this  section,  particularly  the  conjugate  direction  algo¬ 
rithms  of  Davidon  [1982]  and  Gourgeon  and  Nocedal  [1982]. 
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To  justify  the  use  of  the  conic  function  (4.S)  in  unconstrained  minimization 
algorithms,  first  we  must  ask  whether  a  model  of  this  form  is  appropriate  for  use 
i^  rnimmization  algorithms,  and  then,  to  what  use  the  extra  degrees  of  freedom 
in  the  model  may  be  put.  The  answer  to  the  first  question  is  obtained  using  the 
same  techmque  that  may  be  used  to  find  the  minimzer  of  a  quadratic.  Since  for 
nonsingular  A,  (4.2)  can  be  written 


c(d)  =  J^ 


’if 


+  if  -}^'^A  ^g).  (4,7) 


c(d)  has  a  unique  minimer  only  if  A  is  positive  defimte.  In  this  case,  the  minim¬ 
ize  r  is  any  d  satisfying 


if  (4.8)  has  a  solution.  From  (4.6).  the  solution  to  (4.0)  is 


d  = 


(4.8) 


(4.9) 


1+6  ^.4  ^g 

as  long  as  l+6^4~*g^i^0.  So  roughly  speaking,  a  conic  model  has  a  unique 
minimizer  in  almost  the  same  cases  as  a  quadratic  does,  that  is,  when  the 
matrix  in  the  model  is  positive  definite. 


To  use  a  conic  function  of  form  (4.2)  as  a  model  in  a  minimization  algo¬ 
rithm,  presumably  it  should  interpolate  some  function  and  derivative  values  of 
/  (z).  The  first  two  derivatives  of  c(d)  are 


Vc (d) = 


1 

r  6d^ 

1+6  ^d 

l+6^d 

o  1 

1+6  ^d 

(4.10) 


V2c(d)= 

'  ^  {l+6^d)2 


24g6^+26.t?^i4 

(l+6^d)3 


66^ 


(1+6  ^d)3 


2g^^d  + 


Sd^Ad 


1+6  7'd 


Thus 


.(4.11) 


Vc(0)=g,  V^c(O)  =  A  -  bg^  -  gb^.  (412) 

Therefore  to  model  /  (z)  around  Zg  with  a  conic  model,  one  should  use  a  model 


of  the  form 


IB 


since  it  satisfies 


m(x.+d)  =  /(i,)-h 


^id'^Acd 


(4.13) 


Mp^c )  =/  {xc  ).  Vm(a:e  ) =V/  {x^ ) ,  (4. 14) 

for  any  and  /4c e/?"*'".  From  (4.12),  the  second  derivative  matrix  of  this 

model  is 


V2m(a:c  )=4  -6c  V/  (x^  )^-^f  (zc)6 J.  (4. 15) 

Most  of  the  research  on  conic  methods  is  based  on  a  model  of  form  (4.13). 
If  Ag  is  positive  definite  and  1  +  bgA^^Vf  (xj)  ^  0,  it  has  a  unique  minimizer  at 


l+6c^/4c-‘V/(Xc)  ' 


(4.16) 


the  minimizer  is  on  the  same  side  of  the  horizon  as  Xc  if  and  only  if 
l+6//4c“*V/ (xc)  >0.  In  the  remainder  of  this  section,  we  describe  how  several 
authors  have  used  conic  models  in  unconstrained  minimization  algorithms.  This" 
includes  the  choice  of  the  parameters  Ag  and  bg  in  (4.13).  It  is  important  to 
note  that  the  matrix  Ag  wil  be  different  than  the  matrix  used  in  standard 
methods  for  unconstrained  minimization.  Thus  the  direction  as  well  as  the  mag¬ 
nitude  of  the  step  to  the  minimizer  of  the  conic  model  will  be  different. 


Davidon  [1980]  and  Sorensen  [1980]  consider  the  use  of  a  conic  model  in  a 
secant  method  for  unconstrained  minimization.  As  in  our  discussion  of  secant 
methods  in  Section  2,  assume  we  have  teiken  a  step  from  Xg  to  x+,  and  are  con¬ 
structing  a  new  conic  model  of  /  (x)  around  x^.. 


'■/  .X  -/  \  V/(x+)^£i  A+d 

m(x^.+ci)-/(x+)+  (l+6j’d)2 


(4.17) 


Davidon  and  Sorensen  show  how  to  use  this  model  to  obtain 


m(Xc  )  =  /  (xc).  V?n(xc  )  =  V/  (xg  ).  (4-  IB) 

as  well  as  rh{x^)  =  /(x+)  and  Vm,{x^)  -  Vf  (x+)  which  are  satisfied  for  all  values 
of  6+  and  /4+.  Let  Sg  =x+-Xc,  The  first  interpolation  condition  in  (4.18) 
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requires 


=f{x+)  - 

while  the  second  is  satisfied  if 


1-6  Is, 


(i-6|:s,)^ 


V/(Xe)  = 


1-6+Sc 


!/  + 


6+sJ 


l-6isc 


y/  (x^)  - 


As. 


l-b+Sc 


(4.19) 


(4.20) 


l«t  p  -  l-bl’sg.  Teiking  the  inner  product  of  (4.20)  with  Sg  and  then  using  (4.19) 
to  eliminate  the  Sg^.4tSg  term  yields 


p  [P\^f  {x, )  -  2pif  {x^)-f  (xg ))  ■».  sj^f  (x^.)]  =  0.  (4,21) 

If  (4.21)  has  a  nonzero  real  solution  (this  can  be  assured  by  the  choice  of  Xg  in  a 

line  search  algorithm),  then  any  6+  that  satisifies 


bls^  =  l-p  (4.22) 

together  with  any  4+  that  satisfies 

A,s,  =  pVf  (x^)  -  p^Vf  ix,)+p%  ^s^f  (xg  )  (4.23) ' 

causes  the  model  (4.17)  to  satisfy  (4.  IB). 

Thus  by  using  (4.21),  (4.22),  and  (4.23),  the  conic  model  (4.17)  satisfies  the 
three  interpolation  conditions  (2.21)  satisfied  by  standard  secant  methods  for 

unconstrained  minimization,  plus  m(xg)  =/(xg).  This  causes  the  step  to  the 
minimizer  of  the  conic  model  to  depend  on  function  as  well  as  gradient  values, 
whereas  in  a  standard  secant  method,  the  minimizer  of  the  model  is  determined 
solely  by  gradient  values.  This  fact  is  cited  by  several  authors  as  a  reason  why 
eilgorithms  based  on  conic  models  should  be  more  efficient  than  the  correspond¬ 
ing  cdgorithms  based  on  quadratic  models. 

Sorensen  [1980]  proves  the  local  q-superlinear  convergence  of  an  algorithm 
of  the  type  we  have  just  described.  His  algorithm  assures  that  all  the  matrices 
A^  are  positive  definite  through  the  choice  of  the  line  search  parameter,  the 
proper  choice  from  among  the  two  nonzero  roots  of  (4.21),  and  by  using  BFGS 
updates.  The  parameter  6+  is  chosen  in  the  direction  V/(xg).  Sorensen  also 


30 


presents  some  promising  test  results,  but  they  are  from  a  very  limited  set  of 
tests,  and  considerably  more  testing  would  be  required  to  establish  the  utility  of 
this  approach. 

Davidon  [1980]  also  shows  how  A+  and  6+  may  be  chosen  so  that  the  model 
satisfies  the  additional  interpolation  conditions 

m(x_i)  = /(a:_i)  and  Vm{x^)  =  Vf  (x^),  i  =  l,  ■  ■  ■  ,n-l  (4.24) 
when  / (x)  is  a  conic  function,  where  as  in  Section  3  \x^l  are  past  iterates.  For 

nonconic  functions  this  is  not  usually  possible,  however,  and  to  our  knowledge, 

this  idea  has  not  yet  been  used  to  make  a  general  purpose  unconstrained 

minimization  algorithm. 

Schnabel  [1982b]  and  Stordahl  [1980]  study  the  use  of  the  model  (4.13) 
when  the  Hessian  matrix  is  available  analytically  or  by  finite  differences.  The 
model  they  use  is 


m{xg+d)  = 

,  ^/(3^e)’”rf  /^cJ^(V^/(a:c)+f>cV/(xc)^+V/(xc)6/)(i 

From  (4.14)  and  (4.15),  (4.25)  satisfies 


(4.25) 


^(^^C  )  =  /  (*C  ) .  Vto(xc  )  =  V/  (xc  ) .  V®m(Xc )  =  vV  i^c )  ('^- 

for  any  65  e/?'* .  The  vector  be  then  may  be  chosen  to  allow  the  model  to  interpo¬ 
late  additional  information.  Schnabel  and  Stordahl  impose  the  requirement 

m(x_i)  =/(x_i),i  =  l,  ■  •  •  ,p  (4.27) 

atp^sn  past  iterates  x^.  Substitution  into  (4.25)  shows  that  this  model  satisfies 

»^(®-i)  =/(®-i)  if 


b/si  =  fff  =  (4,28) 

.  ■  V/(^c)^^i-V(V/(Xc)^s,)^4(/(x_,)-/(x,))(}^,^V/(x,)s,+V/(x,)^s,) 

/(a:-i)  -f(^c) 


where 
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Si=Zc-x^.  (4.29) 

The  term  Inside  the  square  root  in  (4.28)  may  be  negative  but  rarely  is  in  prac¬ 
tice.  Since  (4.28)  only  determines  the  projection  of  b^.  in  the  direction  s^,  the 
conic  model  (4.25)  may  interpolate  up  to  n  past  function  values  /(^_i)  as  long 
as  the  directions  to  the  past  points.  Si,  are  linearly  independent.  If  /  (a:)  is  qua¬ 
dratic,  the  right  hand  side  of  (4.28)  is  zero  so  the  choice  =0  is  allowed. 

Schnabel  and  Stordahl  tested  an  algorithm  based  on  the  above  model.  They 
found  it  advantageous  to  limit  the  number  of  past  function  values  interpolated 
at  any  iteration  to  p^'Jn ,  and  to  require  the  directions  Si  to  the  past  points  to 
be  strongly  linearly  independent  of  each  other;  roughly  speaking,  each  direction 
Sj  that  is  used  must  make  an  angle  of  at  least  45  degrees  with  the  linear  sub¬ 
space  spanned  by  the  other  directions  Jsj-J  that  are  used.  They  chose  b^  to  be 
the  minimum  Ig  norm  solution  to  the  underdetermined  system  of  equations 

bjsi  =  Oi,  i  =  l.  ■  ■  ■  ,p.  (4.30) 

Schnabel  and  Stordahl  compared  their  algorithm  to  an  algorithm  that  used 
the  standard  second  derivative  quadratic  model  but  was  identical  in  all  other 
respects,  on  the  unconstrained  minimization  test  problems  in  More’,  Garbow, 
and  Hillstrom  [1981].  Each  algorithm  used  the  strategy  in  Dennis  and  Schnabel 
[1983]  to  augment  V/  (xg)  or  to  be  positive  definite  when  necessary,  and  then 
chose  the  next  iterate  using  a  line  search.  Out  of  32  test  runs,  the  conic  algo¬ 
rithm  was  more  efficient  (in  iterations  and  function  evaluations)  on  19,  less 
efficient  on  11,  and  tied  on  4;  on  the  average,  the  conic  algorithm  required  16% 
fewer  iterations  and  21%  fewer  function  evaluations  than  the  quadrtic  algorithm. 
It  is  not  clear  whether  these  results  justify  the  added  complexity  of  the  conic 
method. 

Another  possibility  in  using  model  (4.25)  is  to  choose  bg  so  that  Vm(x_t) 
=:  V/(x_i)  where  x_i  is  the  most  recent  past  iterate.  This  is  more  in  keeping 
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with  the  philosophy  of  the  algorithms  of  Davidon  and  Gourgeon  and  Nocedal  dis¬ 
cussed  next,  as  it  implies  that  m{x)  =  f  (x)  if  f  (x)  is  a  conic  function.  We 
currently  are  investigating  this  approach. 

Davidon  [1982]  and  Gourgeon  and  Nocedal  [1982]  have  proposed  a  conjugate 
direction  method  based  on  conic  models.  It  is  a  generalization  of  the  standard 
conjugate  gradient  method  for  minimizing  quadratics  (Hestenes  and  Stiefel 
[1952])  and  finds  the  minimizer  of  a  conic  function  in  at  most  n  iterations.  An 
extension  of  these  methods  to  nonconic  functions  has  not  yet  been  developed, 
however,  so  no  comparison  with  existing  conjugate  direction  methods  for  uncon¬ 
strained  minimization  is  possible. 

Since  we  do  not  include  a  thorough  treatment  of  conjugate  direction 
methods  in  this  paper,  we  just  give  a  brief  description  of  Davidon’s  and  Gourgeon 
and  Nocedal's  algorithms.  The  key  feature  is  that  they  show  that  if  /  (x)  is  a 
conic  function,  then  given  the  values  of  /(x)  and  V/(x)  at  any  three  collinear 
points,  the  value  of  the  gauge  vector  b  can  be  determined.  The  value  of  the 
gauge  vector  changes  as  the  conic  is  expanded  around  different  points,  but  only 
by  scalar  multiples  that  Davidon  and  Gourgeon  and  Nocedal  also  show  how  to  cal¬ 
culate.  Using  this  information,  they  are  able  to  generalize  the  conjugate  gra¬ 
dient  algorithm  to  exactly  minimize  the  conic  function  along  a  line  d/g  at  the 
iteration,  while  also  choosing  d*  so  that  the  next  iterate  x^,^^  is  the  minimizer  of 
the  conic  in  the  affine  subspace  spanned  by  the  A:— 1  previous  step  directions  di, 

•  •  •  ,  as  well.  Thus  the  algorithm  minimizes  a  conic  function  in  n  or  fewer 
iterations,  without  storing  or  using  an  nxn  matrix.  Gourgeon  and  Nocedal  point 
out  several  possible  numerical  instabilities  in  implementing  this  algorithm,  and 
seem  to  remedy  them  satisfactorily. 

Davidon  [1982]  also  proposes  a  related  algorithm  that  again  minimizes  a 
conic  function  in  n  or  fewer  iterations,  and  edso  accumulates  an  approximation 
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to  the  matrix  A  in  (4.2)  as  it  proceeds.  It  /(x)  is  conic,  after  n  iterations  this 
approximation  is  =  V®/  (x*).  Gourgeon  and  Nocedal  state  that  the  sequence 
of  points  generated  by  their  algorithm  eind  Davidon's  two  algorithms  is  identical 
for  conic  functions,  if  exact  arithmetic  is  used.  Davidon’s  secant  algorithm 
probably  could  be  extended  into  a  new  secant  conic  algorithm  for  unconstrained 
minimization.  This  extension  would  require  the  evaluation  of  /  (x)  and  V/  (x)  at 
three  collinear  points  on  some  or  all  iterations,  as  would  an  extension  of  the  con¬ 
jugate  direction  algorithms  of  Davidon  or  Gourgeon  and  Nocedal  to  nonconics. 

Finally.  Bjorstad  and  Nocedal  [1979]  show  that  the  secant  conic  algorithm 
originally  proposed  by  Davidon  converges  quadratically  under  reasonable 
assumptions  when  applied  to  one  dimensional  problems.  This  rate  is  faster  than 
the  order  (l+Vs/  2)  21.61  convergence  rate  of  the  secant  method  on  one  dimen¬ 
sional  problems.  Similarly,  the  second  derivative  conic  method  of  Schnabel  and 
Stordahl  converges  with  order  l+VS  22.41  on  one  dimensional  problems,  com¬ 
pared  to  the  quadratic  convergence  of  Newton’s  method.  These  results  probably 
have  practical  importance  only  if  they  extend  to  multiple  dimensions,  which  is 
very  unlikely. 


5.  Tensor  methods  for  nonlinear  equations 

Tensor  models  for  nonlinear  equations  augment  the  standard  linear  model 
of  F{x)  Eiround  x^  by  a  second  order  term.  The  most  obvious  tensor  model  is 
the  first  three  terms  of  the  Taylor  series, 

M{x^  +d)  =  F{x^)  +  F'{x,)d  +  }^f'(xa)dd,  (5.1) 

where  However  the  use  of  model  (5.1)  in  a  nonlinear  equations  algo¬ 

rithm  would  violate  many  of  the  principles  expressed  at  the  end  of  Section  2.  In 
particular,  the  model  would  require  at  least  0{n^)  additional  operations  to  form 
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and  roughly  nV  2  additional  locations  to  store,  and  finding  a  root  of  the  model 
would  require  solving  a  system  of  n  quadratic  equations  at  n  unknowns  at  each 
iteration.  Any  of  these  costs  clearly  is  unacceptable. 

Instead.  Schnabel  and  Frank  [1982]  have  proposed  the  use  of  a  model  of  the 
form 

Mix,  +d)  =  Fix, )  +  J^ix,  )d.  +  YiT,  dd  (5-2) 

where  T,  has  a  particulary  simple  form.  In  particular,  the  additional 

costs  of  forming,  storing,  and  finding  a  root  of  the  tensor  model  all  are  small  in 
comparison  to  the  corresponding  costs  edready  required  by  Newton’s  method. 
In  this  section,  we  summarize  how  Schnabel  and  Frank  determine  the  term  T,  in 
(5.2),  how  the  resultant  model  is  solved  efficiently,  and  how  a  nonlinear  equa¬ 
tions  method  utilizes  this  method.  Finally,  we  summarize  some  of  their  test 
resists. 

The  main  motivation  for  the  work  of  Schnabel  end  Frank  was  to  construct  a 
general  purpose  method  that  would  be  more  efficient  than  standard  methods  on 
problems  where  F'ix^)  is  singuleir  or  ill-conditioned,  and  at  least  as  efficient  as 
standard  methods  on  all  other  problems.  An  early  version  of  their  work  is 
reported  in  Frank  [1982].  For  practical  purposes,  it  probably  would  be  more 
desirable  to  have  a  secant  tensor  method  that  does  not  require  values  of  F*ix)\ 
we  currently  are  working  on  such  an  extension. 

To  determine  the  tensor  term  T,  in  (5.2),  we  require  the  model  Mix,'^d)  to 
interpolate  the  function  values  /’(x-i)  atp:sVn  previous  iterates  x_^,  ■  ■  ■  ,  x_p. 
Substituted  into  (5.2),  this  requirement  is 

Fix-i)  =  Fix,)-F'ix,)si  +  yiT,SiSi,  i-1,  ■  •  •  .jo<Vn  (5.3) 

where 

Si  =  X,  —  x_i.  (5.4) 

The  upper  bound  of  Vn  past  points  was  suggested  by  the  computational 
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iexperience  of  Schnabel  and  Stordahl  mentioned  in  Section  4  for  conic  methods 
!that  use  past  function  values;  it  also  is  required  to  keep  the  costs  of  forming  and 
;solving  the  tensor  model  small.  The  past  points  are  selected  by  the  same 
(method  as  is  used  in  Schnabel  and  Stordahl’s  conic  code.  At  each  iteration,  the 
imost  recent  past  point  is  selected.  Then,  for  i=2,  •  •  ■  ,  y/n  ,  the  past  point  is 
(selected  if  it  makes  an  angle  of  at  least  45  degrees  with  the  affine  subspace 
(Spanned  by  the  already  selected  subset  of  the  1®*  through  past  points. 

The  interpolation  conditions  (5.3)  result  in  np  linear  equations  in  the 

elements  of  Tg,  meaning  that  Tc  is  underdetermined.  Following  successful 
(precedent  in  determining  secant  updates  for  nonlinear  equations  (see  e.g. 
iDennis  and  Schnabel  [1979]),  Schnabel  and  Frank  choose  the  Tc  that  solves 


minimize  llTcHj? 

subject  to  TcS-^s^  “  2(F'(^— x)  ’  *  *  tP 


(5.5) 


(Where  j|-||j?  denotes  the  Frobenius  norm.  They  show  that  the  solution  to  (5.5)  has 
the  form 


^  C^'®) 

t=l 

'Where  cqe/?",  i  =  l,  ■  ■  ■  ,p  and  cq  Sj  denotes  the  rank  one  tensor  whose  i,j.k 
'element  is  ait'i]  sxlji]'Si[A:].  Thus  Tg  is  a  rank  p  tensor.  The  p  vectors 
.are  calculated  by  solving  one  sjrmmetric  and  positive  definite  pxp  system  of 
ilineeir  equations  with  n  different  right  hand  sides,  requiring  a  total  of  np®  <  n® 
'each  multiplications  and  additions. 

'Given  the  form  (5.6)  of  Tg,  the  tensor  model  (5.2)  becomes 


M{xc-{-d)  =  F{xg)  +  I^{xc)d  +  Oj  (sfd)®.  (5.7) 

i=l 

Therefore  a  maximum  of  2n'  ®  additional  storage  locations  are  required  for  the 
: tensor  term,  to  store  thep  and  vectors.  (The  x^  and  vectors  can  share 
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storage.)  The  dominant  arithmetic  cost  in.  forming  the  tensor  term  is  the  n®p 
multiplications  and  additions  required  to  form  the  p  right  hand  sides  of 
(5.5).  Neither  of  these  costs  is  significant  in  comparison  to  the  n®  locations  and 
0(n^)  arithmetic  operations  required  to  store  and  solve  the  standard  linear 
model. 

«Ai 

To  use  M(xg+d)  given  by  (5.7)  in  an  algorithm  for  nonlinear  equations, 

presumably  we  need  to  be  able  to  find  a  for  which  M(xg  +d*)=0.  However 

the  model  may  not  always  have  a  real  root,  as  is  obvious  by  considering  the  one 
dimensional  case,  and  in  this  case  it  seems  reasonable  to  find  a  d*  that  minim¬ 
izes  \\M{xs  +d)l]g.  Thus  in  general  we  need  an  efficient  procedure  for  solving 

minimize  ||#(arc+d)||2  (5,0) 

for  M{xg  +d)  given  by  (5.7).  Schnabel  and  Frank  show  how  to  reduce  (5.8)  to  a 
much  smaller  problem  of  the  form 

minimize  ||9(z)||s,  Q:R^-*R'^  (5  0) 

'  ■  ' 

where  the  q  equations  in  p  unknowns  Q{z)  are  quadratic  eindp^g^n,  with  usu- 
ally  g=p.  The  reduction  is  accomplished  using  orthogonal  transformations  of 
the  variable  and  function  spaces.  It  requires  the  QR  decomposition  of  F^{xg) 
which  also  usually  is  used  in  a  Newton's  method  algorithm  and  takes  2nV  3  each 
multiplications  and  additions;  the  next  leading  term  in  the  reduction  is  2n^ 
^  2n®  ®  additional  multiplications  and  additions  which  is  insignificant  in  com¬ 
parison. 

Now  to  find  a  root  or  minimizer  of  the  tensor  model  (5.7)  we  still  must  solve 
the  nonlinear  problem  (5.9);  however,  this  is  inexpensive  due  to  the  reduced 
number  of  variables  in  (5.9).  To  solve  (5.9)  by  a  nonlinear  least  squares  algo¬ 
rithm  costs  Oip^q)  arithmetic  operations  per  iteration,  and  in  practice  a  max¬ 
imum  of  2p  iterations  suffices.  Thus  the  total  cost  of  solving  (5.9)  is  Oip^q) 
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operations,  which  is  £0(n^)  in  the  normal  case  when  p=q,  and  <C»(n2®)  in  all 
cases.  Given  the  solution  to  (5.9),  the  solution  to  (5.B)  is  obtained  by  backsolv- 
ing  and  requires  0(n^)  arithmetic  operations.  Therefore  the  total  cost  of  finding 
a  root  or  minimizer  of  the  tensor  model  is  not  significantly  more  than  calculat¬ 
ing  the  root  of  the  standar-d  linear  model  using  a  QJ?  factorization.  It  is  twice  as 
expensive  as  calculating  the  root  of  the  standard  linear  model  using  Gaussian 
elimination;  however  many  production  codes  do  use  the  QB  factorization 
because  it  facilitates  the  global  portion  of  the  algorithm  and  the  modifications 
required  when  ■F'(xc)  is  ill-conditioned  (see  Dennis  and  Schnabel  [1983]). 

An  important  point  is  that  the  model  (5.7)  may  have  isolated  solutions  even 
when  /^(xc)  is  singular  and  F(xg)  is  not  contained  in  the  subspace  spanned  by 
the  columns  of  F'(xc):  if  so  the  method  of  Schnabel  and  Frank  should  find  a  solu¬ 
tion.  Similarly,  (5.8)  often  is  not  an  ill-conditioned  problem  when  the  solution  of 
the  linear  model  would  be.  It  is  hoped  that  these  properties  are  beneficial  on 
problems  where  /^(x»)  is  singular.  Another  interesting  issue  is  that  (5.7)  usually 
has  multiple  roots  or  minimizers;  we  try  to  find  the  c£»  closest  to  the  Newton 
step,  by  appropriately  choosing  the  starting  point  in  the  algorithm  that 

solves  (5.9). 

Schnabel  and  Freink  have  tested  an  cdgorithm  based  on  a  preliminary  ver¬ 
sion  of  the  above  techniques.  At  each  iteration,  after  determining  the  solution 
to  (5.8),  it  takes  =Xc+d^  if  this  is  an  acceptable  next  iterate.  If  not,  it 
does  a  line  search  in  the  Newton  direction  if  d*  is  not  a  descent  direction,  or 
occasionally  if  it  prefers  the  Newton  direction  for  other  reasons  given  in  Frank 
[1982],  otherwise  it  does  a  line  search  in  the  direction  d*.  Schnabel  and  Frank 
compare  their  algorithm  to  an  algorithm  that  uses  the  standard  linear  model 
exclusively  but  otherwise  is  identical  to  the  tensor  algorithm,  using  the  test 
problems  in  More’,  Garbow,  and  Hillstrom  [1981].  For  all  but  one  of  these  prob¬ 
lems  (Powell’s  singular  function)  F'(xk)  is  nonsingular,  so  they  also  construct 


2B 


singular  versions  of  the  problems  in  More',  Garbow,  and  Hilistrom,  as  described 
in  Schnabel  eind  Frank  [1962].  The  dimensions  of  the  problems  range  from  2  to 
50,  with  most  of  the  problems  having  dimension  between  10  and  50. 

Detailed  results  of  these  tests  are  given  in  Freink  [1982]  and  in  Schnabel 
and  Freink  [1962].  In  summary,  on  44  problems  where  F'(a:„)  is  nonsingular,  the 
tensor  method  requires  fewer  iterations  and  function  and  Jacobian  evaluations 
than  the  standard  method  in  30  cases,  the  same  number  in  13  cases,  and  more 
in  1  case.  The  improvements  rarely  are  by  more  them  20%,  however  most  of 
these  problems  are  quite  easy  when  using  emalytic  Jaeobians,  requiring  10  or 
fewer  iterations.  On  a  set  of  37  problems  where  has  rank  n-1,  the  tensor 

method  requires  fewer  iterations  and  function  and  Jacobian  evaluations  in  26 
cases,  the  same  number  in  8  cases,  and  more  in  3  cases.  Here  the  improvement 
by  the  tensor  method  is  far  more  dramatic;  the  standard  method  fails  to  solve 
11  of  the  37  problems  in  150  iterations,  while  the  tensor  method  solves  all  of 
these  in  at  most  27  iterations.  On  a  second  differently  constructed  set  of  25 
problems  where  again  has  rank  n— 1,  the  tensor  model  requires  fewer 

iterations  and  function  and  Jacobian  evaluations  in  all  cases.  The  standard 
method  solves  24  of  these;  on  these  problems,  the  tensor  algorithm  requires  an 
average  of  53%  of  the  iterations  and  58%  of  the  function  evaluations  used  by  the 
standard  algorithm.  Finally,  on  37  problems  where  has  rank  n-2,  the 

tensor  model  again  does  better  in  26  cases,  the  same  in  8.  and  worse  in  3,  again 
with  substantial  improvements  in  many  cases.  Clearly  these  results  show  prom¬ 
ise  for  the  tensor  methods. 
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6.  Future  applications  of  conic  and  tensor  models 

Methods  that  use  conic  and  tensor  models  are  still  in  their  infancy. 
Depending  upon  their  success,  conic  and  tensor  models  might  eventually  be 
used  in  many  areas  of  optimization  besides  the  ones  described  in  Sections  4  and 
5.  In  this  section  we  comment  briefly  on  some  of  these  possibilities. 

Conic  models  or  collinear  scalings  do  not  seem  suited  to  the  nonlinear  equa¬ 
tions  problem.  The  collinear  transformation  of  the  standard  linear  model  for 
nonlineeir  equations. 

M{x,-^d)  =  F{x,)  +  (6.1) 

suffers  from  the  fact  that  its  root  is  in  the  Newton  direction  —F'{xc)~^F{xc)  for 
all  values  of  bg ;  thus,  only  a  steplength  parameter  is  introduced.  On  the  other 
hand,  conic  models  certainly  could  be  applied  to  nonlinear  least  squares  or  con¬ 
strained  optimization  problems.  They  eilso  may  be  useful  for  solving  special 
classes  of  optimization  problems,  for  example  penalty  functions  where  the  hor¬ 
izon  of  the  conic  function  might  help  reflect  the  shape  of  the  penalty  function. 

Hopefully,  tensor  models  can  be  applied  to  ail  four  problem  classes  listed  in 
Section  2.  derivative  and  secant  methods  for  both  nonlinear  equations  and 
unconstrained  minimization.  In  fact,  our  own  main  practical  interest  would  be 
in  a  secant  tensor  method  that  quickly  solves  unconstrained  minimization  prob¬ 
lems  with  vV(x*)  singular  or  ill-conditioned,  because  we  see  many  such  prob¬ 
lems  in  practice.  An  example  is  overpareimeterized  data  fitting  problems,  where 
In  our  experience  the  objective  function  usually  is  sufficiently  expensive  to 
evaluate  that  secant  methods  are  preferred.  The  extension  of  the  method  of 
Section  5  to  a  secant  method  for  nonlinear  equations,  and  the  extension  to 
unconstrained  minimization,  both  appear  to  present  challenging  problems.  The 
tensor  method  of  Section  5  generalizes  virtually  without  change  into  a  Gauss- 


30 


Newton  or  Levenberg-Marquardt  type  method  for  solving  the  nonlinear  least 
squares  problem,  and  since  the  analytic  or  finite  difference  Jacobian  always  is 
used  in  this  setting,  such  a  method  would  be  of  practical  interest.  Tensor 
methods  also  may  be  useful  in  finding  complex  solutions  to  nonlinear  equations 
or  optimization  problems. 

Finally,  there  certainly  are  other  nonstandard  models  besides  conics  and 
tensors  that  could  be  considered  for  nonlinear  equations  or  optimization  prob¬ 
lems.  The  main  contribution  of  conic  and  tensor  models  may  be  that  they  cause 
researchers  to  consider  various  nonstandard  models,  and  that  some  of  these 
prove  to  be  useful  in  practice. 
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